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Abstract. New supernova surveys such as the Dark Energy Survey, Pan-STARRS and the 
LSST will produce an unprecedented number of photometric supernova candidates, most 
with no spectroscopic data. Avoiding biases in cosmological parameters due to the resulting 
inevitable contamination from non-la supernovae can be achieved with the BEAMS formal- 
ism, allowing for fully photometric supernova cosmology studies. Here we extend BEAMS 
to deal with the case in which the supernovae are correlated by systematic uncertainties. 
The analytical form of the full BEAMS posterior requires evaluating 2^ terms, where N 
is the number of supernova candidates. This 'exponential catastrophe' is computationally 
unfeasible even for N of order 100. We circumvent the exponential catastrophe by marginal- 
ising numerically instead of analytically over the possible supernova types: we augment the 
cosmological parameters with nuisance parameters describing the covariance matrix and the 
types of all the supernovae, r^, that we include in our MCMC analysis. We show that this 
method deals well even with large, unknown systematic uncertainties without a major in- 
crease in computational time, whereas ignoring the correlations can lead to significant biases 
and incorrect credible contours. We then compare the numerical marginalisation technique 
with a perturbative expansion of the posterior based on the insight that future surveys will 
have exquisite light curves and hence the probability that a given candidate is a Type la 
will be close to unity or zero, for most objects. Although this perturbative approach changes 
computation of the posterior from a 2^ problem into an N'^ or one, we show that it leads 
to biases in general through a small number of misclassifications, implying that numerical 
marginalisation is superior. 
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1 Introduction 

Type la supernovae (SN la) are standardisable candles, making them one of the most reliable 
distance measures and a cornerstone of cosmology ever since the discovery of the late time 
accelerated expansion of the Universe [1, 2]. 

Future surveys which will produce large amounts of photometric data, such as the Dark En- 
ergy Survey (DES) [3], Pan-STARRS [4] and the Large Synoptic Survey Telescope (LSST) 
[5], will increase the number of SN la candidates by orders of magnitude. While a fool- 
proof method of identifying a Type la is to analyse its observed spectrum, taking spectra is 
expensive and, for surveys such as those mentioned above, it will be unfeasible to perform 
spectroscopic follow-up for all candidates, introducing a possible bias due to contamination 
from Type Ib/c and Type II supernovae, which we collectively denote non-la supernovae (SN 
nia) [6, 7]. However, using the photometric information gathered by the survey, one can fit 
template light curve models to the data using a template fitter such as MLCS2k2 [8], SALT2 
[9] or a model fitter such as in [10], which gives a probability for each object to be a Type la. 
In previous surveys, the relative SN la/nia probabilities were used only to determine candi- 
dates for spectroscopic follow-up [11]. Without spectroscopic follow-up, applying a probabil- 
ity cut (for example, taking all supernovae with probability greater than 0.9 to be a Type la) 
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will introduce a bias in the cosmological parameters [6]. To avoid such biases, one can either 
demand a very high purity, which excludes much of the data [12-14] or use all the data within 
a statistical framework that accounts for the contamination. One such method, developed by 
Kunz, Bassett & Hlozek [15] is Bayesian Estimation Applied to Multiple Species (BEAMS). 
BEAMS has recently been applied to the full three years of data from the SDSS-II supernova 
survey [11, 16], which reduced the — Q\ contours by a factor of three relative to the 
spectroscopic data alone [6]. 

Despite this success, the current implementation of BEAMS assumes the supernovae are not 
correlated with each other; an approach which will not be appropriate for future surveys. In 
general, systematic uncertainties are correlated and, as the number of discovered supernovae 
increases, we are entering an era where systematic uncertainties are comparable to statistical 
uncertainties. This means the off-diagonal terms of the covariance matrix cannot be ignored. 
To analytically account for correlations between supernovae in the BEAMS posterior requires 
summing over terms, where N is the number of supernova candidates and m is the num- 
ber of possible supernova types, which here we take to be two, corresponding to la's and 
nia's. Clearly, this is computationally impossible, but in this paper we will show that if the 
form of the covariance matrix is known (for example, if the known sources of correlations 
can be parameterised), BEAMS can still be used to estimate cosmological parameters in an 
unbiased way, using a numerical marginalisation over supernova type. 

After a brief review of the theory of BEAMS in section 2, we move on to discuss possible 
sources of correlations in section 3. We introduce mock supernova datasets described by 
three different covariance matrices in section 4. In section 5 we discuss a solution to the 
'exponential catastrophe' of the correlated form of the BEAMS posterior using numerical 
marginalisation of types and in section 6 we compare it with an alternative solution based 
on a perturbative expansion of the BEAMS posterior. 

2 BEAMS 

Cosmological parameter estimation usually proceeds by maximising the posterior, P{6\D), 
where D is the set of redshifts and distance moduli of spectroscopically confirmed Type 
la supernovae and 6 is the set of cosmological parameters, such as fi^n, and Hq. What 
happens if we do not have spectroscopic confirmation of an object's type but only a probability 
that it is a la? Unbiased parameter estimation in this case can be achieved using Bayesian 
Estimation Applied to Multiple Species (BEAMS). 

BEAMS [15] considers all data in a given sample, appropriately weighting each data point 
based on its probability of being a la. Let be the type of object i and = la if the object 
is a Type la and = nia if the object is not a la (for example, if it is a Type Ib/c or a Type 
II supernova). Then we can write the posterior, P{0\D) (here 0, D and r are understood to 
be vectors), as 



This sum marginalises over all possible combinations of types for the dataset so r here is a 
length- vector (where N is the number of objects). For example, if there were three objects 
in the dataset, the first term in the sum would have ti = la, T2 = la, T3 = la, the second 
term would have ti = nIa, T2 = la, T3 = la etc. Thus, in general, for the case of two distinct 
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object types, this is a 2^ summation. Applying Bayes' theorem gives that 

P{e,T\D) = p{D\e,T)^^. (2.2) 

P{D)^ the Bayesian evidence, can be considered as a normahsation factor and ignored in 
further calculations. We will assume that P{6,r) = P{9)P{t) (see [15] for a discussion of 
this assumption). P(0) is the usual prior on the parameters (probability based on prior 
knowledge about the parameters), and P{t) can be written as 

n n ^^-Pk)- (2.3) 

This is the product of the probabilities, P^, for all the objects typed as a la, ri — la, multiplied 
by the product of (1 — Pi) for all the objects typed as nia's, ri — nia. Here we assume the 
data and the object types are uncorrelated (see [17] for details of BEAMS without this 
assumption). Thus the BEAMS posterior is given by 

p{e\D)^p{e)Y,P{D\e,T) J] Pj H (i-^^)- (2-4) 

T rj=Ia rfc=nla 

As an order 2^ calculation, this is computationally unfeasible. In the case of uncorrelated 
data, there is a simplification we can use to make the problem tractable. By decomposing 
the likelihood as a product of probabilities, it can be shown [15] that the posterior can be 
written as 

p{e\D) (X p{e) \[ (ci,^i{e)p, + c^i,^,{i - pA , (2.5) 
i=i ^ ^ 

where >Cia,i is the likelihood assuming the i'th object is a la and >Cnia,i is the likelihood 
assuming it is a nIa. 

This formula works well for uncorrelated data but is unlikely to be accurate in the case of 
correlated data. For the large datasets which will be available in the future, correlations 
among the data cannot be ignored and another solution must be found to apply BEAMS to 
these datasets in a computationally feasible way. 



3 Correlated supernova data 

Correlations between supernovae has only recently become an issue that must be included in 
cosmological parameter sets (e.g. [18, 19]) but will become progressively more important as 
we push on the systematics floor related to SNIa surveys. Correlated systematic uncertainties, 
the focus of this paper, can arise from a large number of sources and a detailed study of 
correlations has yet to be undertaken due to the diversity and complexity of the various 
contributing factors. Schematically correlations can arise from: 

• Peculiar velocities: When supernovae are within 50 Mpc or so of each other the peculiar 
velocities of their host galaxies will be correlated by large-scale bulk flows. These 
peculiar velocities cause correlated redshift errors. Usually redshift errors are converted 
into additional distance modulus errors (see e.g. [20]) but even if this is not done it 
will cause errors in the recovered cosmological parameters. See, for example, [21-24]. 
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• Redshift- colour and redshift- stretch correlations: If there is no spectroscopic host galaxy 
redshift for an object, the redshift is estimated photometricahy either from the host 
or the supernova multi-band light curves. The effect of redshift on the light curve is 
degenerate to some extent with the stretch and colour corrections. Hence errors on 
redshift will correlate with those in the colour and stretch and thus with the estimated 
distance modulus of the supernova [25]. 

• Filter errors: Transmission curves for the actual filters used on a telescope approximate 
true through-put. Errors in measuring these transmission curves (or time dependent 
changes of the filters) [26] will tend to induce redshift-dependent correlations. For ex- 
ample, in [27], the authors consider the error in the flux calibration of the telescope 
which, since it affects each filter differently, correlates objects at similar redshifts. Zero- 
point photometry errors are a major source of uncertainty in current surveys that are 
common to all supernovae observed with the same telescope; e.g. [19]. 

• Template error correlations: Unaccounted for evolution of supernovae with redshift 
causes correlated errors due to errors in the light curves that form the training set. An 
example of this is the 'U-band anomaly' which causes discrepancies between the SALT2 
and MLCS2k2 light curve fitters which may be related to an excess of flux in the UV 
at high redshift in SNIa [20, 28]. 

• Observational conditions: Bad weather will cause holes in the light curve coverage 
of all supernovae visible at a given time, while seeing conditions will alter photometry 
measurements in a correlated way. These will induce subtle correlations between objects 
observed on the same night in similar conditions [5, 29]. 

• Combining data from multiple telescopes: The covariance matrix for combining data 
from multiple telescopes can be very complex, as discussed in the 3 year SNLS analysis 
[18]. With the exception of perhaps the final LSST dataset, combining data from 
multiple surveys will continue to be standard. 

• Gravitational lensing: Supernovae that are close together on the sky, at similar redshifts 
will experience similar brightening or dimming due to lensing, depending on the matter 
distribution along the line of sight. Future large surveys will have mass maps and hence 
will be able to predict and remove this signal to some extent [30-32]. 

• Dust: Supernovae along neighbouring lines of sight will suffer similar extinction from 
the Milky Way and from any intergalactic dust, which will induce correlations [33, 34]. 

• Host- galaxy correlations: There is now solid evidence that dispersion in the Hubble 
diagram correlates with the properties of host galaxies, particularly the galaxy type, 
size and mass. See, for example [35-38]. 

• Spectroscopic targeting correlations: Since spectroscopic follow-up is typically not ran- 
dom, there may be hidden correlations. For example, follow-up may favour candidates 
well-separated from the host galaxy core. Malmquist bias can also cause correlations 
which depend on the details of the spectroscopic survey [11, 20]. If there is an un- 
known systematic that such objects are intrinsically brighter/fainter than average, this 
will cause a correlated systematic error [11]. 
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• nia correlations: Many of the sources of correlation listed here will also affect nia's, 
causing correlations between their distance moduli. However, these correlations will 
typically be much smaller than the intrinsic dispersion of the nIa population. 

Given the complexity of these various effects, developing a 'realistic' correlation matrix is 
beyond the scope of this work and will need to be laboriously built from simulations and 
detailed studies. Here we wish instead to develop ways of dealing with the general problem 
posed by the 2^ 'exponential catastrophe' that correlations present. We use several toy 
model covariance matrices to study potential resolutions of this catastrophe, finding that 
numerical marginalisation over the supernova types performs best. 



4 Mock data 



In order to determine the effect that correlated data can have on parameter estimation with 
BEAMS, we use mock supernova datasets, the properties of which are known. This enables 
us to estimate the magnitude of the bias introduced by correlations between the data points 
and to determine the optimum way to handle this additional source of error. 
To simulate supernova data, we need to create a distance modulus for each object. The 
distance modulus is defined as 

f dL 



Mz) = m - M = 51ogio (^^^ j + 25, (4.1) 

where m is the apparent magnitude of the object, M is the absolute magnitude of the object 
and dL is the luminosity distance to the object in Mpc [39]. Through the distance modulus, 
SN la can be used to constrain cosmological parameters. The luminosity distance depends 
on the cosmological parameters (assuming a ACDM model) by 



dL{z) - 



Ho\/—flk 
where 



^in(^OA/^/^), (4.2) 



1/2 

H{z) = Ho ( + z)' + l^A + Qkil + zy] , (4.3) 



and Hq is the Hubble constant, is the energy density of matter, Q\ is the energy density 
of dark energy and fl^ is the energy density of curvature (all densities relative to the critical 
density) [39]. We used a ACDM model to generate the mock data with the following param- 
eters: Ho = 70.4 km/s/Mpc, = 0.272 and = 0.628 (values taken from [40]). 

SN la typically have very little scatter in their distribution of distance moduli, while nIa's 
tend to be widely scattered. SN la also tend to be brighter than nIa's. As such, we model 
the mock data as drawn from two populations: the la distribution is a narrow Gaussian 
centred on the fiducial cosmological distance modulus, fi(z)^ with standard deviation aia- 
The nIa distribution is a wide Gaussian, centred on fi(z) + 6, where 6 is a constant shift^. 
The nIa distribution has standard deviation cTnia- While in general one could model each 
known type of supernova as a separate population, two populations are sufficient for this 
work. For each object, redshifts in the range of 0.01 < z < 2.0 were drawn from a uniform 



^It was shown in [6] that BEAMS is able to reconstruct the correct redshift evolution for the nIa distribution. 
Here we assume, without loss of generality, that 6 is a constant shift. 
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Figure 1. Distance modulus (/i) as a function of redshift for a typical mock dataset with 200 super- 
novae. Blue points are SN la, green triangles are SN nIa and the red line is the fiducial cosmological 
model. The dotted line is the model for the nia's. The intrinsic dispersion of the la's is aja = 0.1 and 
that of the nIa's is dnia = 1-5, as indicated by the error bars. We show the true types and intrinsic 
scatter of the points for clarity here but note that neither are given to BEAMS, which receives only 
the probabilities and the /i values as input. 

random distribution, while probabilities, P^, were drawn from a distribution that simulates 
the expected Dark Energy Survey probability distribution, shown in figure 2. There were 
roughly equal numbers of la's and nIa's. Here we assume that the probabilities are correct, 
but it was shown in [6, 15] that BEAMS can deal with biased probabilities in general. 



2500 




LO 0.2 0.4 0.6 0.8 1.0 

Probability 



Figure 2. Histogram of Dark Energy Survey-like SNIa probabilities, where the supernova data was 
generated using SNANA and fitted with MLCS2k2 to obtain the probabilities. The blue (dark), 
outlined bars are la's and the red (light) bars are nIa's. This figure illustrates the point that future 
data will deliver bimodal probabilities peaked near and 1, and helps guide the probability distribution 
for our mock data. 
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The type of each object was randomly chosen with probabihty Pi which ahowed the object 
to be assigned a distance modulus. Residuals were drawn from the appropriate distribu- 
tion, depending on type, correlated using one of the covariance matrices described below and 
added to the appropriate la or nia mean. A typical mock dataset distance modulus diagram 
is shown in figure 1, where we have chosen the standard deviation of the la population to be 
cTia = 0.1, that of the nia's to be (Jnia = 1-5 and the shift of the nIa population to be 6 = 2.0. 
In this paper we experimented with datasets varying between 200 and 1000 supernovae. 
Since the exact form of the true covariance matrix for supernovae is unknown, we use simple 
toy models for a covariance matrix for the mock data. Our main goal is to show how BEAMS 
can be used for correlated data, so we do not attempt to model a realistic covariance matrix. 
We analyse the effects of three different covariance matrices. In all cases, we order the objects 
according to redshift, since we expect the strongest correlations to be functions of redshift. 
As they will come from the same sources, we expect la-nia and nia-nia correlations to be 
of similar magnitude as la-Ia correlations. However, since the dispersion of the nIa distribu- 
tion is so large, these small correlations have little influence and only the la-Ia correlations 
affect the results. We found that when we created a dataset with correlations between all 
supernova types and analysed it assuming only la correlations, the results were identical to 
an analysis which included non-zero nIa-nIa and la-nia correlations. Hence, we only consider 
la-Ia correlations in our analysis. 

4.1 Wedding cake covariance matrix 

This example matrix is based on the supernova covariance matrix in Kim et al. [41], which 
is guaranteed to be positive-definite. This matrix has a wedding cake or step-like structure: 
an error entered at a given redshift contributes to the error of all higher redshift objects. 
This structure naturally arises as one loses observational features at higher redshifts thus 
introducing errors. We constructed a matrix based on this, in which we divided the data set 
into five redshift bins, adding an extra source of error in each bin. This structure is given as 

Cij = aiaj6ij + Vij, (4.4) 
where = aja if T^=Ia and = dnia if T^=nla and 

^0 otherwise, 

where riij is the bin to which the object belongs. To produce the step-like structure, riij = 
^mm(^j ^ (where "[J" indicates the floor function, rounding down to the nearest integer). 
For this covariance matrix we used Sk = 0.015 for A: = 1 to 5, as illustrated in the top left 
panel of figure 3. 

4.2 Decaying Covariance Matrix 

This covariance matrix assumes positive correlations between objects which are nearby in 
the covariance matrix, with the correlations decaying as the distance between the indices of 
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objects is increased. The exact form of this covariance matrix is 




if i=j & ^i—^j — 
if i=j & ^i—^j — nia 
if i^^j & ^i—^j — 
otherwise 



(4.6) 







This is ihustrated in the middle left panel of figure 3, where we set x = 0.7. 
4.3 Block-diagonal covariance matrix 

To illustrate our method with a more realistic example, we used the covariance matrix for 
the Union2 sample of 557 supernovae. [19, 42] constructed this matrix by parameterising 
all known sources of correlations for the Union2 dataset, fitting these nuisance parameters 
simultaneously with the cosmological parameters and providing an estimate of the best-fit 
covariance matrix. We binned into 11 redshift bins, which we then applied to our mock data. 
An example for one mock data realisation can be seen in the bottom left panel of figure 3. We 
have set nia-nia and nia-la correlations to zero, since these should be very small compared 
to the intrinsic dispersion of the nia's. We added a af term to the diagonal where = aja if 
object z is a la and ai = a^u if it is a nia. The block diagonal structure arises largely from 
the fact that supernovae from the same survey are correlated with each other. 
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5 Numerical marginalisation over supernova type 



5.1 Theory 

A solution to the computational problem of handling correlated data would be to perform 
the marginalisation over the types numerically instead of analytically, allowing us to use the 
correlated BEAMS posterior without the 2^ sum. In order to do this, we create N discrete 
nuisance parameters, the types of the supernovae, r^, and marginalise over these parameters 
in our Markov Chain Monte Carlo (MCMC) [43, 44] analysis. This problem is similar to the 
Ising spin problem because these parameters are discrete and can only assume one of two 
values [45]. 

In each step of the MCMC chain, we randomly select one object and set the corresponding 
Ti =Ia with probability P^, which significantly speeds up convergence when compared with 
varying all the types each step. To further improve convergence, we choose the initial type 
parameters based on the ratio of la to nia uncorrelated likelihoods for each object. If the 
ratio is greater than one, we set the initial type for that object to a la, otherwise to a 
nIa. To compute this initial likelihood ratio, we use some fiducial values for the cosmological 
parameters. This initial choice for the types has little bearing on the final result, but without 
it, the chain typically starts with a very low likelihood because many objects have the wrong 
type and it takes much longer to reach the region of the peak. Thus, if represents the set 
of parameters in the MCMC chain and r the types, the usual MCMC acceptance criterion is 



We applied this technique using MCMC to estimate cosmological parameters from the 
mock datasets. We ran at least three MCMC chains for each dataset produced and ensured all 
chains were converged using the Gelman- Rubin [46] criterion for convergence, with i? < 1.01. 
Chains were, on average, 60 000 steps long after burn-in was removed. It should be noted 
that for all MCMC chains, the following flat, wide priors were used, to ensure unbiased 
parameter estimation (since small datasets were used to reduce computational time, which 
tend to have large contour areas): —0.2 < < 1-2, —0.2 < Q\ < 1.2 and 10 < i^o < 130. 
We allowed the shift, 6, of the nIa population to vary with a flat prior of 1 < 6 < 3 and 
the standard deviations of the populations, aja and (Jnia to vary in log space with priors of 
—3.0 < log(cria) < — 1.2 and —0.7 < log(ania) < 1-5. To allow for the most general case where 
only the form of the covariance matrix is known, but its parameters are not, we also varied the 
covariance parameters for each individual covariance matrix over a flat prior, except for the 
block covariance matrix whose off-diagonal terms remained fixed, since they were determined 
from the Union2 dataset. 

5.2 Testing BEAMS with numerical marginalisation 

We first generated uncorrelated datasets to directly compare the numerical and analytic 
marginalisation of BEAMS. We found that the — Q\ credible contours^ produced using 
eq.(2.5) and those produced using the numerical marginalisation technique were identical for 
the uncorrelated datasets, showing that the technique works as expected. 



^In this paper, we use credible intervals, the Bayesian analogue to confidence intervals. Bayesian cred- 
ible intervals represent the degree of belief in the parameter estimates, and are derived from the posterior 
probability. They are not, in general, equivalent to frequentist confidence intervals [47]. 
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It is interesting to note that the marginahsation method is faster than uncorrelated BEAMS, 
since there are half the calculations to perform at each step (uncorrelated BEAMS com- 
putes the likelihood assuming the object is nia and la respectively, whereas the numerical 
marginahsation only requires the likelihood given the current types). However, it takes longer 
for the correlated BEAMS chains to converge. On average, using the Gelman- Rubin [46] cri- 
terion for convergence, uncorrelated BEAMS chains converge within 10 000 steps whereas 
the correlated BEAMS chains only converge after about 40 000 steps. We tested datasets 
of up to 1000 data points in size and found that, although the amount of time taken to do 
a single likelihood calculation increases linearly, the number of steps taken to converge does 
not increase appreciably as the number of data points is increased. Hence, computational 
complexity will not be an issue for correlated BEAMS. 

Figure 3 compares the uncorrelated BEAMS approach with correlated BEAMS (using nu- 
merical marginahsation of types) in the case of correlated data, based on the three different 
covariance matrices from section 4. The uncorrelated approach includes no information about 
correlations and hence it is not surprising that it is biased at > 2a for both the decaying 
covariance matrix and the wedding cake covariance matrix. In contrast, correlated BEAMS 
with numerical marginahsation correctly estimates the cosmological parameters without any 
bias. There is less of an effect from the block diagonal covariance matrix, because both the 
correlations and the dataset are small. Although even in this case, it is clear the uncorrelated 
BEAMS contours appear to be mildly biased. 
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Figure 3. Results of the numerical marginalisation of types with BEAMS using three different co- 
variance matrices: The left column shows schematic views of the three covariance matrices, where 
the colour indicates the amount of correlation between two data points. The right column shows the 
— contours, marginalising over i^o, when the correlated data sets are analysed assuming no 
correlations (dotted Hues) and using correlated BEAMS (filled contours). The covariance matrices 
and their parameters are (from top to bottom): the wedding cake covariance matrix (s/c =0.015 for 
k=l to 5), the decaying covariance matrix {x=0.7) and the block diagonal covariance matrix. All 
covariance parameters (for the wedding cake and decaying matrices) are marginalised over with wide, 
fiat priors. Note: for all matrices, the diagonal has been removed for clarity because dnia is so much 
larger than the off-diagonal terms. 



5.3 Testing the Contours and Coverage Properties of the BEAMS Estimators 



To test the accuracy of the correlated BEAMS contours, we created 5000 datasets, each 
consisting of 200 correlated supernovae, using the wedding cake covariance matrix. We then 
ran a 100 000 step MCMC chain for both uncorrelated and correlated BEAMS on each of 
the datasets, totaling about a billion MCMC steps. Figure 4 shows scatter plots for both 
uncorrelated BEAMS and correlated BEAMS. Each point represents the maximum posterior 
values of the parameters for one of the 5000 datasets. We computed the mean squared error 
(MSE), which is the sample average of the squared distance between the estimates and the 
true value for the parameters (in this case and I^a), for both uncorrelated and correlated 
BEAMS. We found the relative efficiency, defined to be the ratio of the MSE for uncorrelated 
BEAMS to that of correlated BEAMS, to be 2.3, which implies that correlated BEAMS is a 
much more efficient estimator than uncorrelated BEAMS for correlated data, i.e. it estimates 
parameters with much less scatter. 

We also plotted the 95% credible contours for five randomly selected datasets to show how 
the size and shape of the contours are underestimated by uncorrelated BEAMS but well 
estimated by correlated BEAMS, for these correlated datasets. To quantify this point, we 
computed the coverage, which we here define to be the proportion of the 5000 datasets where 
the true value (marked by the black circle in figure 4) lies within the 95% credible interval 
for each dataset derived from the corresponding MCMC chain for that dataset. We found 
that the coverage was 88.2% for correlated BEAMS and only 7.2% for uncorrelated BEAMS, 
showing that accounting for the correlations is crucial in getting the correct contours. Ideally, 
the coverage should be close to 95%. However, the reader is cautioned that coverage is a 
frequentist concept and only for asymptotically large datasets would we expect the frequentist 
coverage to coincide with the corresponding coverage from Bayesian credible intervals. As 
our datasets only have 200 points each, we should not be surprised if the coverage is not 
exactly 95%. Undercoverage occurs even for the standard method applied to supernovae 
datasets (see figure 8 in [48]). 




Figure 4. Scatter plots of the maximum posterior estimates for the cosmological parameters for each 
of 5000 correlated datasets using the wedding cake covariance matrix (eq. 4.4). Left panel: BEAMS 
assuming the data are uncorrelated. Right panel: correlated BEAMS with numerical marginalisation 
over covariance matrix parameters and types. Also plotted are the 95% credible contours from five 
randomly selected datasets to give an idea of the size and shape of the contours. This shows that, 
while both methods are unbiased on average, correlated BEAMS is a much better estimator and has 
far superior coverage properties (88.2% vs. only 7.2%). 
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The coverage properties of both estimators are ihustrated in figure 5. For both uncorre- 
lated and correlated BEAMS we plotted the effective coverage (on the y-axis), i.e. the fraction 
of datasets for which the given credible contour contains the input cosmological parameters, 
at that level of credibility (on the x-axis). So for example, the 0.95 credible level corresponds 
to a coverage of 0.882 and 0.072 for correlated and uncorrelated BEAMS respectively. Ideal 
coverage is the diagonal straight line across the plot, where (for example) 0.95 of the datasets 
would contain the input cosmology in the 0.95 credible contour. It is clear that the coverage 
of correlated BEAMS is close to ideal and far superior to that of uncorrelated BEAMS. 




Credible level containing input cosmology 

Figure 5. Effective coverage, defined to be the fraction of datasets which contain the input cosmology 
at a given credible level for the 5000 datasets shown in figure 4 for both uncorrelated BEAMS (blue, 
dashed line), correlated BEAMS (red, solid line) and for the ideal coverage case (black, dotted line). 
While correlated BEAMS is close to ideal, implying that its contours can be trusted, uncorrelated 
BEAMS dramatically undercovers and hence the contours cannot be trusted for correlated data, as 
expected. 

6 A perturbative expansion of the BEAMS posterior 
6.1 Theory 

For future surveys, in exactly the limit where it will be important to apply BEAMS to avoid 
biases from probability cuts, the light curve data for both candidates and templates will 
be excellent and the probability that most candidates are a la will therefore be close to 
zero or unity. This is illustrated in figure 2, which shows simulated DES probabilities using 
MLCS2k2 [8] and SNANA [49]. 

In this limit of abundant, high-quality data, we may expect we could perform a perturbative 
expansion of the full correlated BEAMS posterior, to find an approximation to the full 
posterior as an alternative to the numerical marginalisation over type. If we write eq. (2.4) 



-13- 



as a function of the probabilities, we can Taylor expand the posterior around the point 
where the probabilities are rounded to either zero or one {PRnd)'- 



P{e\D,p) oc P{9\D,p) + e • VP{e\D,p) 

PRnd 



^ 1_, 

PRnd 2 



e-H{P{e\D,p)) 



PRnc 



+ ■■■ , (6.1) 



where Ci = pi — Rnd{pi) and Rnd{pi) is the rounded value of the probability of the i'th data 
point (which will be either or 1) and H is the Hessian matrix. 

Appendix A contains details of the derivation. The resulting posterior, up to the second 
order term, is 
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+ 



. (6.2) 



6.2 Results 



This perturbative approximation breaks down because we expand the posterior about the 
rounded off probabilities (that is, if the probability is close to 1 we take the object to be a 
la and a nia if it close to 0), but due to the extreme nature of the likelihood, this tends to 
lie quite far from the maximum likelihood, the point about which one would like to perform 
the expansion. This happens because crja/crnia is very small, hence if too many nia's are 
mistyped as la's when rounding off the probabilities, these terms cause higher order terms in 
the Taylor expansion to dominate, however we only include terms up to second order term 
in this expansion in order for the analysis remain computationally viable (the second order 
term is an calculation, the third order N"^ and so on). Thus the perturbative expansion 
is not a good enough approximation of the true likelihood. 

This can be seen in figures 6 and 7, where we show 1 and 2-a contours in the ^m—^A plane. In 
each case, the fiducial value is shown by a star. While the underlying data are correlated, the 
uncorrelated form of BEAMS assumes uncorrelated data. In addition, perturbative BEAMS 
cannot sufficiently correct for the mistyped terms, and thus fail to recover the simulated data 
model at > 3a. 

The failure of perturbative BEAMS to correct for mistyped terms can be understood further 
by considering figure 8. In the left panel, we show perturbative BEAMS successfully approx- 
imates the true likelihood, as does correlated BEAMS using numerical marginalisation, for 
a small dataset. The right panel, however, shows this is not the case for a large dataset. 
The perturbative BEAMS estimate for Q\ is biased, because mistyped objects in the dataset 
cause higher order terms in the perturbative approximation to dominate. As shown pre- 
viously, however, correlated BEAMS (with numerical marginalisation of types) accurately 
recovers the input cosmology. 
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Figure 6. Comparison between uncorrelated BEAMS (line contours), correlated BEAMS (filled, red 
contours) and the perturbative expansion of BEAMS (filled blue contours with dashed lines), for an 
uncorrelated dataset, showing how the perturbative expansion fails badly even in the uncorrelated 
case. 




Figure 7. Comparison between uncorrelated BEAMS (line contours), correlated BEAMS (filled, red 
contours) and the perturbative expansion of BEAMS (filled blue contours with dashed lines), for 
a correlated dataset, using the decaying diagonal covariance matrix, showing how the perturbative 
expansion fails badly 
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Figure 8. These plots demonstrate the difference between the true BEAMS hkehhood (dotted, 
black line), the perturbative approximation (dashed, red line) and correlated BEAMS with numerical 
marginalisation (solid, blue line). Left panel: A set of ten data points was created using the decaying 
covariance matrix to correlate the data. Qrn and Hq are set to 0.272 and 70.4 respectively and is 
allowed to vary. For this small dataset, perturbative BEAMS and correlated BEAMS are both good 
approximations to the true BEAMS likelihood. Right panel: A similar dataset except with 200 data 
points this time. Now, perturbative BEAMS poorly approximates the true likelihood and produces a 
bias in Qa. BEAMS with numerical marginalisation, however, is unbiased. The true likelihood cannot 
be calculated with such a large dataset. 



7 Conclusion 

Photometric supernova surveys with unprecedented amounts of data will provide an exciting 
opportunity to learn about the structure and evolution of the universe. Due to the vast 
number of supernova candidates and their extended redshift distribution in large future sur- 
veys, it will be difficult, if not impossible, to spectroscopically follow-up all candidates as is 
normally required in cosmological analyses. BEAMS is a rigorous statistical method which 
avoids biases while using all supernova candidates, together with the probability that a can- 
didate is a Type la supernova, derived from the multicolour lightcurves of the candidate. 
Until now, BEAMS has been applied assuming the supernovae are uncorrelated [6, 15, 17], 
an assumption which will be inappropriate for future surveys. Without this assumption, the 
analytical form of the BEAMS posterior is computationally unfeasible. If the uncorrelated 
form of BEAMS is applied to a dataset with correlated systematic uncertainties, the posterior 
for the cosmological parameters can be incorrectly estimated. 

To deal with this 'exponential catastrophe', we have explored two different approaches. The 
first marginalises over all the possible combinations of object type numerically instead of 
analytically, by including the types as discrete nuisance parameters in our MCMC chains, 
making it computationally efficient. We have shown, with three separate models of covari- 
ance matrices, that this algorithm successfully recovers the input cosmology in the correlated 
case without bias. In addition, we have shown with 5000 mock datasets that the correlated 
BEAMS credible contours are reliable estimates of the true error contours. This is something 
that cannot be easily reproduced without using the correlated BEAMS formalism. 
The second approach we considered was a perturbative expansion of the BEAMS posterior 
which typically fails because, when too many objects are mistyped, the higher order terms 
of the expansion (which are neglected due to computational constraints) dominate over the 
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lower order terms, even when the probabihties are very close to zero or one thus producing 
a biased posterior. However, with numerical marginalisation over types, correlations be- 
tween supernovae do not appear to be an impediment to using BEAMS in analysing future 
photometric supernova surveys. 
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A Perturbative BEAMS 
A.l Theory 

The aim is to derive a formula for the BEAMS posterior in the case where the probabilities 
are all close to zero and one. We start with the BEAMS posterior in the general case: 



P{6\D) = ^P{D\e,r)P{9)Pir)/P{D) 



(A.l) 



This results in 2^ terms being calculated, in the case where there are two types. 

We can Taylor expand this posterior, written as a function of the probabilities p (this is 

a length- vector containing the probability for each object), around the point where the 

probabilities are rounded to either zero or one. We define the vector e such that Ci = 

Pi — Rnd{pi)^ where Rnd{pi) is the rounded value of the probability of the i'th data point 

(which will be either or 1). So for example, if pi = 0.99 then ei = —0.01 and if pi = 0.02 

then ei — 0.02 etc. 

The BEAMS posterior becomes: 
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where H is the Hessian matrix. 
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A. 1.1 The first order term 

Now we will explicitly calculate the first order derivative term. The i'th component of 
V P{9\D^p) is given by 



dP{e\D,p) 



= J2Pimr)P{0) 



dP{i 



dpi Z_^- V I"' ■ /- v-y 

This derivative is trivial since P{t) is hnear in the piS. P{t) is defined as 



(A.3) 
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Since only one of these terms is a function of pi (where i equals either j or k depending on 
its type), the derivative is given by 

Here, P(t_^) is equivalent to eq. (A. 4), with the contribution to the product from i'th object's 
probability removed. Since P^. is either pi or (1 — Pi)^ its derivative is given by 

ifr.=la 

dvi \-l ifri = nla. 

Thus: 

= 5: ±Pm. r)Pi9)Pir-.) . (A.7) 

Once we subsitute for the rounded probabilities, many of the terms in P{f-i) will go to zero. 
In fact, each element of must equal its rounded type for P{r-i) in order to be non-zero, 
in which case, P{f-i) — 1. This leaves only two terms, corresponding to the two possible 
types of the i'th data point. Thus: 

dP{9\D,p) 



dpi 



= [P{D\9, n=i.) - P{D\9, T,^nia)] P{e) , (A.8) 

PRnd 

where r^^ia is the same as TRnd (where TRndi — la if Rnd{pi) — 1 and TRndi — nia if 
Rnd{pi) = 0) with the i'th element being Type la and r^^nia is the same except the i'th 
element is Type nia. 

Finally, the first order BEAMS posterior is given by 
P{e\D,e) (X P{e\D,p) + ^ (^e, X [P{D\e,n=i.) - P{D\e, n=ni.)] X P{e)] . (A.9) 

For N data points, the order of this calculation is 2N times the order of the likelihood 
calculation. So in the case where the likelihood is given by the usual Gaussian form 
this would be an order calculation. 

A. 1.2 The second order term 

Our treatment can be repeated for the second order term. The second order derivative is 
given by 

P{j) can again be expressed as a product of one z-dependent term, one j-dependent term 
and one term independent of both 

P(f) = P,,P..P_,, (A.ll) 



- 18 - 



if z 7^ j. li i — then the second derivative of P{t) wih be zero. 
By the product rule, this is 



dpidpj dpi dpj ^^'^ ^'^^ ' 



Which evaluates to 



dpidpj 



if i=j 

1 if i ^ j and Tj = tj 
^-1 iii^ j and n ^ tj. 



(A.12) 



(A.13) 



Similarly to the first order term, when we evaluate the full posterior (eq. (A. 10)) at the 
rounded probabilities, we find that only four terms remain (still for i ^ j): 



d^P{e\D,p) 
dpidpj 
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where 



nj=ia TRnd except with Ti Tj la 
rij=nid. = TRnd exccpt with Ti = la, Tj = nia 

Ti=la,j=nla = TRnd CXCCpt with Ti = nIa, Tj = la 
^i=nla,j=la = TR^d CXCCpt with Ti = Tj = ula. 

Putting this all together, the second order term of the BEAMS posterior is 
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Thus the second order term is order (4A^)^ times the order of the likelihood calculation. For 
a Gaussian likelihood, this is an order calculation. 
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